Landscape encodings enhance optimization 
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Hard combinatorial optimization problems deal with the search for the minimum cost solutions 
(ground states) of discrete systems under strong constraints. A transformation of state variables may 
enhance computational tractability. It has been argued that these state encodings are to be chosen 
invertible to retain the original size of the state space. Here we show how redundant non-invertible 
encodings enhance optimization by enriching the density of low-energy states. In addition, smooth 
landscapes may be established on encoded state spaces to guide local search dynamics towards the 
ground state. 



I. INTRODUCTION 

Complex systems in our world are often computation- 
ally complex as well. In particular, the class of NP- 
complete problems lj, for which no fast solvers are 
known, encompasses not only a wide variety of well- 
known combinatorial optimization problems from the 
Travelling Salesman Problem to graph coloring, but also 
includes a rich diversity of applications in the natural sci- 
ences ranging from genetic networks [2] through protein 
folding 3 to spin glasses @H2|- I n such cases, heuristic 
optimization - where the goal is to find the best solution 
that is reachable within an allocated time - is widely ac- 
cepted as being a more fruitful avenue of research than 
attempting to find an exact, globally optimal, solution. 
This view is motivated at least in part by the realiza- 
tion that in physical and biological systems, there are 
severe constraints on the type of algorithms that can 
be naturally implemented as dynamical processes. Typi- 
cally, thus, we have to deal with local search algorithms. 
Simulated annealing 8J, genetic and evolutionary algo- 
rithms [9 , as well as genetic programming [lOj are the 
most prominent representatives of this type. Their com- 
mon principle is the generation of variation by thermal 
or mutational noise, and the subsequent selection of vari- 
ants that are advantageous in terms of energy or fitness 
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The performance of such local search heuristics natu- 
rally depends on the structure of the search space, which, 
in turn, depends on two ingredients: (1) the encoding of 
the configurations and (2) a move set. Many combinato- 
rial optimization problems as well as their counterparts in 
statistical physics, such as spin glass models, admit a nat- 
ural encoding that is (essentially) free of redundancy. In 
the evolutionary computation literature this "direct en- 
coding" is often referred to as the "phenotype space", X. 
The complexity of optimizing a cost function / over X is 
determined already at this level. For simplicity, we call / 
energy and refer to its global minima as ground states. In 



evolutionary computation, one often uses an additional 
encoding Y, called the "genotype space" on which search 
operators, such as mutation and cross-over, are defined 
more conveniently [TJl [T3] . The genotype-phenotype re- 
lation is determined by a map a:F->lU i^}? where 
represents phenotypic configurations that do not oc- 
cur in the original problem, i.e. non- feasible solutions. 
For example, the tours of a Traveling Salesman Problem 
(TSP) [TJ] are directly encoded as permutations describ- 
ing the order of the cities along the tour. A frequently 
used encoding as binary strings represents every connec- 
tion between cities as a bit that can be present or absent 
in a tour; of course, most binary strings do not refer to 
valid tours in this picture. 

The move set (or more generally the search operators 
|15| ) define a notion of locality on X. Here we are in- 
terested only in mutation-based search, where for each 
x E X there is a set of neighbors N(x) that is reachable 
in a single step. Such neighboring configurations are said 
to be neutral if they have the same fitness. Detailed in- 
vestigations of fitness landscapes arising from molecular 
biology have led to the conclusion that high degrees of 
neutrality can facilitate optimization 116] . More pre- 
cisely, when populations are trapped in a metastable phe- 
notypic state, they are most likely to escape by crossing 
an entropy barrier, along long neutral paths that traverse 
large portions of genotype space [17] . 

In contrast, some authors advocate to use "synony- 
mous encodings" for the design of evolutionary algo- 
rithms, where genotypes mapping to the same pheno- 
type x £ X are very similar, i.e., a~ 1 (x) forms a lo- 
cal "cluster" in Y, see e.g. [131 QUI EH]- This picture 
is incompatible with the advantages of extensive neutral 
paths observed in biologically inspired landscape mod- 
els [IHl HO] and in genetic programming [3TJ [52] . An 
empirical study |23| . furthermore, shows that the intro- 
duction of arbitrary redundancy (by means of random 
Boolean network mapping) does not increase the perfor- 
mance of mutation-based search. This observation can be 
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understood in terms of a random graph model of neutral 
networks, in which only very high levels of randomized 
redundancy result in the emergence of neutral paths [21] • 
An important feature that appears to have been over- 
looked in most recent literature is that the redundancy 
of Y with respect to X need not be homogeneous [12]. 
Inhomogeneous redundancy implies that the size of the 
preimage |a _1 (a;)| may depend on x G X. If |a _1 (a;)| is 
anti-correlated with the energy f(x), then the encoding 
Y enables the preferential sampling of low-energy states 
in X. Thus even a random selection of a state yields lower 
energy when performed in Y than in X. Here we demon- 
strate this enrichment of low energy states for three es- 
tablished combinatorial optimization problems and suit- 
ably chosen encodings. The necessary formal aspects of 
energy landscapes and their encodings are outlined in the 
Methods section. We formalize and measure enrichment 
in terms of densities of states on X and Y, see Methods 
for a formal treatment. We illustrate the effects of encod- 
ing by comparing performance of optimization heuristics 
on the direct and encoded landscapes. 



II. RESULTS AND DISCUSSION 



A. Number Partitioning 



di and a,j into different subsets, i.e. X4 7^ Xj. 

The prepartitioning encoding is obtained by modify- 
ing the initial condition of the heuristic. Each number 
di is assigned a class j/j € {1, . . . , n}. A new NPP in- 
stance a' x , . . . , d' n is generated by adding up all numbers 
di in the same class yi into a single number a! y . . After re- 
moving zeros from a', the differencing heuristic is applied 
to a'. In short: j/j = yj imposes the constraint Xi = Xj. 
Running the heuristic under this constraint, the resulting 
configuration x = a{y) is unique up to flipping all spins 
in x. The so defined mapping a : Y — > X is surjective 
because for each x £ X, a(y) = x for t/j = 1 if Xi = 1 and 
yi = 2 otherwise. Two encodings y, z € Y are neighbors 
if they differ at exactly one index i G {1, ... ,71}. This 
encoding is the one whose performance we will compare 
with the direct encoding later. 



B. Traveling Salesman 

Our next optimization problem, the Traveling Sales- 
man Problem, (TSP) is another classical NP-hard op- 
timization problem \T\. Given a set of n vertices (cities, 
locations) {1, . . . , n} and a symmetric matrix of distances 
or travel costs d\j, the task is to find a permutation (tour) 
7r that minimizes the total travel cost 



The first optimization problem we consider is the num- 
ber partitioning problem (NPP) lj: this asks if one can 
divide n positive numbers a±, a 2 , . . . , a„ into two subsets 
such that the sum of elements in the first subset is the 
same as the sum over elements in the other subset. The 
energy is defined as the deviation from equal sums in the 
two subsets, i.e., 
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where the two choices Xi G {— 1, +1} correspond to as- 
signment to the first or to the second subset, respectively. 
The flipping of one of the spin variables Xi is used as a 
move set, so that the NPP landscape is built on a hy- 
percube. The NPP shows a phase transition between an 
easy and a hard phase. We consider here only instances 
that are hard in practice, i.e., where the coefficients a, 
have a sufficiently large number of digits [25!. 

The so-called prepartitioning encoding [5S] of the NPP 
is based on the differencing heuristic by Karmakar and 
Karp [37]. Departing from an NPP instance (oi, . . . , d n ), 
the heuristic removes the largest number, say a,, and the 
second largest dj and replaces them by their difference 
di — dj. This reduces the problem size from n to n—1. Af- 
ter iterating this differencing step n—1 times, the single 
remaining number is an upper bound for — and in many 
cases a good approximation to — the global minimum 
energy. The minimizing configuration itself is obtained 
by keeping track of the items chosen for differencing. Re- 
placing di and dj by their difference amounts to putting 
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where indices are interpreted modulo n. Here, the states 
of the landscape are the permutations of {1, . . . , n}, X — 
S n . Two permutations ir and a are adjacent, {ir, a} G L, 
if they differ by one reversal. This means that there are 
indices i and j with i < j such that o~k — Ki+j-k for 
i < k < j and o~k — t^u otherwise. 

Similar to the NPP case, an encoding configuration 
y G Y := {1, 2, . . . , n} n acts as a constraint. A tour 
7r G X fulfills y if for all cities i and j, yi < yj implies 
< 7r (j). Thus yi is the relative position of city 
i in the tour since it must come after all cities j with 
yj < yi. All cities with the same y- value appear in a 
single section along the tour. If there are no two cities 
with the same y- value then y itself is a permutation and 
there is a unique ir G X obeying y, namely it = y^ 1 . 

Among the tours compatible with the constraint, a se- 
lection is made with the greedy algorithm. It constructs 
a tour by iteratively fixing adjacencies of cities. Starting 
from an empty set of adjacencies, we attempt to include 
an adjacency {i,j} at each step. If the resulting set of 
adjacencies is still a subset of a valid tour obeying the 
constraint, the addition is accepted, otherwise is 
discarded. The step is iterated, proposing each {i, j} ex- 
actly once in the order of decreasing d^j. This procedure 
establishes a mapping (encoding) a : Y — > X. Since each 
tour 7r can be reached by taking y = ir^ 1 , a is complete. 
In the encoded landscape, two states y, z G Y are adja- 
cent if they differ at exactly one position (city) i. 
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C. Maximum Cut 

The last example we consider is a Spin Glass problem. 
Consider the set of configurations X = { — l,+l} n with 
the energy function 

f(x) = - JjjXjXj (3) 

for a spin configuration x £ X. Proceeding differently 
from the usual Gaussian or ± J spin glass models [25ll29| . 
we allow the coupling to be either antiferromagnetic or 
zero, Jij £ { — 1,0}. This is sufficient to create frustra- 
tion and obtain hard optimization problems. Taking the 
negative coupling matrix — J as the adjacency matrix of 
a graph G, the spin glass problem is equivalent to the 
max-cut problem on G, which asks to divide the node 
set of G into two subsets such that a maximum number 
of edges runs between the two subsets [T]. 

The idea for an encoding works on the level of the 
graph G, which we assume to be connected. The set Y 
of the encoding consists of all spanning trees of G. In the 
mapped configuration x — a(y), Xi and Xj have different 
spin values whenever ij is an edge of the spanning tree y. 
Since a spanning tree is a connected bipartite graph, this 
uniquely (up to +1/ — 1 symmetry) defines the spin con- 
figuration x. The encoding a is not complete in general. 
Homogeneous spin configurations, for instance, are not 
generated by any spanning tree. Each ground state con- 
figuration Xg r ound, however, is certain to be represented 
by a spanning tree due to the following argument. Sup- 
pose there is a minimum energy configuration Xg 10 und 
that is not generated by any spanning tree. Then the 
subgraph of G formed by all edges connecting unequal 
spins in a; groU nd is disconnected. We choose one of the 
connected components, calling its node set G. By flip- 
ping all spins in G, we keep all edges present for a; groun( j- 
Since G is connected, we obtain at least one additional 
edge from a node in G to a node outside G. Thus we 
have constructed a configuration with strictly lower en- 
ergy than Xground, a contradiction. Two spanning trees 
y, z £ Y are adjacent, if z can be obtained from y by 
addition of an edge e and removal of a different edge /. 

D. Enrichment 

We now study enrichment as well as landscape struc- 
ture on these three rather different problems. To this 
end, we consider the cumulative density of states 

Q f (r,) = \{xGX:f(x)<r]}\/\X\ (4) 

in the original landscape and Q f OOC defined analogously in 
the encoded landscape. In order to quantify the enrich- 
ment of good solutions, we compare the fraction h of all 
states with an energy not larger than a certain threshold 
r\ in the original landscape with the fraction r(h) using 
the same threshold in the encoding. The encoding thus 



enriches low energy states if r(h) h for small h. Fig- 
ure [l] shows that this is the case for the three landscapes 
and encodings considered here. We find in fact that the 
density of states r(h)/h is enriched by several orders of 
magnitude in the encoded landscape, for all the cases 
considered. 

Reassuringly, this trend of enrichment persists all the 
way to the ground state: that is, the encodings contain 
many more copies of the ground state than the origi- 
nal landscape. It appears in fact that the enrichment of 
ground states increases exponentially with system size. 
We can thus conclude that with the choice of an appro- 
priately encoded landscape, it is easier both to find lower 
energy states from higher energy ones, and thus have 
more routes to travel to the ground state, as well as to 
reach the ground state itself from a low-energy neighbor, 
as a result of enrichment. 



E. Neighborhoods and neutrality 

We continue the analysis of the encodings with atten- 
tion to geometry and distances. A neutral mutation is a 
small change in the genotype that leaves the phenotype 
unaltered. In the present setting, a neutral move in the 
encoding is an edge {y, z} £ M such that a(y) = a(z). 
In general, the set of neutral moves is a subclass of all 
moves leaving the energy unchanged. An edge {x, y} 
with f(a{x)) = f(a(y)) but a(x) ^ ot{y) is not a neu- 
tral move in the present context. In the following, we 
examine the fraction of neutral moves for the encoded 
landscapes mentioned above. 

Figure |2|a) shows that the fraction of neutral moves 
approaches a constant value when increasing the prob- 
lem size of NPP and max-cut. The fraction of neutral 
moves in the traveling salesman problem, on the other 
hand, decreases as 1/n with problem size n. The average 
number of neighbors encoding the same solution grows 
linearly with n, since the total number of neighbors is 
ri (n — 1) for each y £ Y in the TSP encoding. 

If a move in the encoding is non-neutral, how far does 
it take us on the original landscape? We define the step 
length of a move {y, z} £ Y as the distance between the 
images of y and z on the original landscape, 

s ({Vi z }) = dx(a(y),a(z)) (5) 

using the standard metric dx on the graph (X, L). Ob- 
viously, {y, z} is neutral if and only if s({y, z}) = 0. Fig- 
ure [2|b) compares the cumulative distributions of step 
length for number partitioning and max-cut. It is in- 
tractable to get the statistics of s for the TSP problem 
for larger problem sizes since sorting by reversals, i.e., 
measuring distances w.r.t. to the natural move set, is a 
known NP-hard problem [50] , 

For the encoding of number partitioning, step lengths 
are concentrated around n/2. Making a non- neutral 
move in this encoding is therefore akin to choosing a suc- 
cessor state at random. For the max-cut problem, the 
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FIG. 1: Enrichment of the density of low energy states for landscape encodings. In panels (a,b,c), a point (h,r(h)) 
on a curve indicates a fraction h of all states have an energy not larger than a certain threshold 77 in the original landscape 
whereas this fraction is r(h) using the same energy threshold in the encoding. Panel (d) shows the average enrichment of the 
ground state as a function of problem size for traveling salesman (<£>), number partitioning (□), and max-cut (o). Error bars 
give the standard deviation over 100 independent realizations. In panels (a-c), the solid curves are for 10 random instances of 
each landscape and system size. The dashed lines follow r(h) oc h in panel (a) and r(h) oc h 3 ^ 4 in panel (b). 



result is qualitatively different. Step lengths are broadly 
distributed with most moves spanning a short distance 
on the original landscape. Based on this it is tempting 
to conclude that optimization proceeds in 'smaller steps' 
on the max-cut landscape, than in the NPP problem. 



F. Evolutionary dynamics 

One might ask if the encoded landscape also facil- 
itates the search dynamics, by virtue of its modified 
structure, and offers another avenue for optimization. 
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FIG. 2: Neutrality and encoded step length, (a) The fraction of neutral neighbors as a function of problem size, (b) The 
cumulative distribution of the distance moved in the original landscape by a single step in the encoding. Solid curves are for 
the max-cut, dashed curves for the number partitioning problem, with curve thickness distinguishing values of problem size n. 
For both plots (a) and (b), data have been obtained by uniform sampling of 10 4 neighboring state pairs on 10 2 independently 
generated instances of each type of landscape. 



For this purpose, we consider an optimization dynamics 
as a zero-temperature Markov chain x(0), x(l), x(2), 
At each time step t, a proposal x' is drawn at ran- 
dom. If f(x') < x(t), we set x(t + 1) = x', otherwise 
x(t + 1) = x(t). This is an Adaptive Walk (AW) when 
the proposal x' is drawn from the neighborhood of x(t). 
In Randomly Generate and Test (RGT), proposals are 
drawn from the whole set of configurations independently 
of the neighborhood structure. Thus a performance com- 
parison between AW and RGT elucidates if the move set 
is suitably chosen for optimization. Because of the en- 
richment of low energy states by the encodings, it is clear 
that RGT performs strictly better on the encoding than 
on the original landscape. 

Adaptive walks also perform strictly better on the en- 
coding than on the original landscape, at least in the 
long-time limit, cf. Figure [3] Beyond this general ben- 
efit of the encodings, the dynamics shows marked dif- 
ferences across the three optimization problems. In the 
NPP problem, RGT outperforms AW on the encoded 
landscape, so that enrichment alone is responsible for the 
increase in optimization with respect to the original land- 
scape. In the encodings of the other two problems, AW 
performs better than RGT so that we can conclude that 
the improved structure of the encoded landscape is also 
an important reason for the observed increase in perfor- 
mance, in addition to simple enrichment. The dynamics 
on the max-cut landscapes (panel c) has the same quali- 
tative behavior as that on the TSP (panel a). Although 



there is a transient for intermediate times where adap- 
tive walks on the original landscape seem to be winning, 
the asymptotic behavior is clear: adaptive walks on the 
encoded landscape perform best. 



G. Conclusion 

We have examined the role of encodings in arriving 
at optimal solutions to NP-complctc problems: we have 
constructed encodings for three examples, viz. the NPP, 
Spin-Glass and TSP problems, and demonstrated that 
the choice of a good encoding can indeed help optimiza- 
tion. In the examples we have chosen, the benefits arise 
primarily as a result of the enrichment of low-energy so- 
lutions. A secondary effect in some but not all encodings 
considered here is the introduction of a high degree of 
neutrality. The latter enables a diffusion-like mode of 
search that can be much more efficient than the combi- 
nation of fast hill-climbing and exponentially rare jumps 
from local optima. The two criteria, (1) selective en- 
richment of low energy states and, where possible, (2) 
increase of local degeneracy, can guide the construction 
of alternative encodings explicitly making use of a priori 
knowledge on the mathematical structure of optimization 
problem. The qualitative understanding of the effect of 
encodings on landscape structures in particular resolves 
apparently conflicting "design guidelines" for the con- 
struction of evolutionary algorithms. 
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FIG. 3: Performance comparison between three types of stochastic dynamics: adaptive walks (AW) on the original 
(□) and encoded (o) landscapes and randomly generate and test (RGT) on the encoded landscape (o). The plotted performance 
value is the fraction of instances for which the considered evolutionary dynamics is "leading" at time t, i.e. has an energy not 
larger than the other two types of dynamics. For each landscape, 100 random instances are used with sizes n — 30 in panels 
(a) and (b), n — 200 in panel (c). On each of the instances, each type of evolutionary dynamics is run once with randomly 
drawn initial condition y(0) £ Y for RGT and AW in the encoded landscape. The AW on the original landscape is initialized 
with the mapped state a;(0) = a(y(0)). Thus all three dynamics are started at the same energy. 
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The beneficial effects of enriching encodings immedi- 
ately pose the question whether there is a generic way 
in which they can be constructed. The constructions 
for the NPP and TSP encodings suggest one rather gen- 
eral design principle. Suppose there is a natural way of 
decomposing a solution x of the original problem into 
partial solutions. We can think of a partial solution £ 
as the set of all solutions that have a particular prop- 
erty. In the TSP example, £ refers to a set of solutions 
in which a certain list A of cities appears as an uninter- 
rupted interval. Now we choose the encoding y so that 
it has an interpretation as a collection 3(y) of partial so- 
lutions. A deterministic optimization heuristic can now 
be used to determine a good solution x*(E(y)). In the 
case of the TSP, S(j/) corresponds to a set of constrained 
tours from which we choose by a greedy solution. Alter- 
natively, S(y) may over-specify a solution, in which case 
the optimization procedure would attempt to extract an 
optimal subset of 3' C 3(y) so that Hfe"' £ contains a 
valid solution x* . In either case, a : y i— > x* is an en- 
coding that is likely to favour low-energy states. It is 
not obvious, however, that the spanning-tree encoding 
for max-cut can also be understood as a combination of 
partial solutions. It remains an important question for 
future research to derive necessary and sufficient condi- 
tions under which optimized combinations of partial so- 
lutions indeed guarantee that the encoding is enriching. 

Methods 

Landscapes and encoding 

A hnite discrete energy landscape (X, L, f) consists of 
a finite set of configurations X endowed with an adja- 
cency structure L and with a function / : X M called 
energy, and hence — / fitness. The global minima of / 
are called ground states. L is a set of unordered tu- 
ples in X, thus (X, L) is a simple undirected graph. Let 
(Y, M) be another simple graph and consider a mapping 
a : Y — > X U {0}, which we call an encoding of X. 
Then (Y, M, f o a) is again a landscape. (If we include 
states in Y that do not encode feasible solutions we assign 
them infinite energy, i.e., / o a(y) = +oo if a(y) = 0.) 
The encoding is complete if a is surjective, i.e., if every 
ieXis encoded by at least one vertex of y € Y. Both 
landscapes then describe the same optimization problem. 
In the language of evolutionary computation, (Y, M) is 
the genotype space, while (X, L) is the phenotype space 
corresponding to the "direct encoding" of the problem. 
With this notation fixed, our problem reduces to under- 



standing the differences between the genotypic landscape 
(Y, M, foot) and the phenotypic landscape (X, L, f) w.r.t. 
optimization dynamics. 

Test Instances 

Random instances fox max-cut (spin glass) are gen- 
erated as standard random graphs |31j with parameter 
p = 0.5: each potential edge is present or absent with 
equal probability, independent from other edges. Dis- 
tances dij — dji for the symmetric TSP and numbers 
di for NPP are drawn independently from the uniform 
distribution on the interval [0, 1]. 



Enrichment factor and Density of States 

The enrichment factor r(h) can be obtained directly 
from the cumulative densities of states of the two land- 
scapes: 

r(h) = Q foa (Qf(h)) . (6) 

This expression is a well-defined function for arguments 
h G [0, 1] because Qf 0a only changes value where Qf also 
does. For ground state energy 7y , the enrichment of the 
ground state is Q f oa {r)o) /Q fivo)- 

The results in Figure [lja-c) are obtained by sampling 
2 x 10 7 uniformly drawn states each from the original 
states X and the prepartitionings Y for the traveling 
salesman. For the two other problems, the density of 
states of the original landscapes is exact by complete enu- 
meration. For the spin glass also, the density of states 
for Y is exact from calculation based on the matrix-tree 
theorem. For number partitioning, 2" samples in Y are 
drawn at random. 

The enrichment of the ground state, Figure |T|(d) , is an 
average over 100 realizations for each problem type and 
size n. For each realization of number partitioning and 
max-cut, 2" uniform samples in Y are taken; the ground 
state energy itself is obtained by complete enumeration of 
X. For each realization of the traveling salesman prob- 
lem, 10 9 uniform samples are taken in Y; the ground 
state energy is computed with the Karp-Held algorithm 
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